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»s1 , This paper is dedicated to our parents and family. 

Abstract. In this paper we show the probabilistic convergence of the original 
Collatz (3n + 1) (or Hotpo) sequence to unity. A generalized form of the 
Collatz sequence (GCS) is proposed subsequently. Unlike Hotpo, an instance 
of a GCS can converge to integers other than unity. A GCS can be generated 
using the concept of an abstract machine performing arithmetic operations on 
different numerical bases. Original Collatz sequence is then proved to be a 
special case of GCS on base 2. The stopping time of GCS sequences is shown 
LJ ■ to possess remarkable statistical behavior. We conjecture that the Collatz 

convergence elicits existence of attractor points in digital chaos generated by 
arithmetic operations on numbers. We also model Collatz convergence as a 
Ch , classical ruin problem on the digits of a number in a base in which the abstract 

machine is computing and establish its statistical behavior. Finally an average 
bound on the stopping time of the sequence is established that grows linearly 
with the number of digits. 

^NJ ' 1. Description Of The Problem : Collatz Conjecture 

cn ; 

p^ _ The original Collatz function is defined as :- 

en 

c*^l , ,.. , , , , _ J 3n + 1 , n = 1 mod 2 

(N ■ [ ' JW- | | ^ 7i = mod 2 

Beginning with any positive integer, and taking the result of Q at each step as 
the input at the next, a sequence can be formed, as explained below:- 
K^ ' Assume x = f(ri). Applying / on x itself to get f(f(n)). For shorter notation, 

i— i . it can be written as f 2 (n). In general after applying this function "k" number of 

times, can be written as f k {n). 

According to the Collatz conjecture repeated application of this process will 
eventually converge to '1', regardless of which positive integer is chosen initially. 
In other words the Collatz conjecture states there exist an integer a(n), however 
large, for every positive integer 'n' , such that / <T '"- ) (n) = l.This <r(n) is called the 
Stopping Time. 
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2. Contemporary Works : Inspiration for our work 

Lagarias [2] describes the history of the Collatz problem and surveys all the 
major literature. Lagarias [2] is taken as the primary reference for this paper. 
Below mentioned observations are inspiration for this paper, as presented in the 
subsections. 

2.1. Collatz Sequence In Base 2. It is a well known fact that computing Collatz 
sequence can be seen as an abstract computer computing the sequence in base '2', 
whose only operations would be left shift, right shift and addition, as explained 
below. 

The machine will perform the following steps on any odd number 'n' until the 
number reduces to "1" :- 

1: Left shift 'n' by one bit : thus giving 2n; 

2: Add (1) to the original number, 'n' by binary addition 

(giving 2n+n = 3n); 
3: Add 1 to the right of the new number in (2) by binary addition. 
4: Remove all trailing "0"s (i.e. repeatedly divide by two until the result is 

odd). This is the right shift operation. 

It is also known from Lagarias [5] that the odd numbers generated in the sequence 
are generated in 'almost' random order and that the convergence can be treated 
probabilistically. 

2.2. General Collatz Class. Several authors (Moller [3]) have investigated the 
range of validity of the result that has a finite stopping time for almost all integers 
"n" by considering more general classes of periodicity linear functions. One such 
class consists of all functions which are given by 



(2) U(n) 



b ' 
mn — r 



r = mn 



n = mod b 
mod b , otherwise 



b 
It has been shown in Moller [3] that iff 

m < 0^=1 
then sequence generated by @ would converge. 

2.3. Stopping Time. Assuming that the Conjecture is true, one can consider the 
problem of determining the expected stopping time function. Crandall [5] and 
Shanks [6] were guided by probabilistic heuristic arguments to conjecture that for 
large set of large random integers the ratio of stopping time to natural logarithm 
of the integer should approach a constant limit:- 



where i (\ is sampled over large set of large integers. 
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3. The Work 

3.1. Overview. In present work an intuitive general formula for Collatz like se- 
quences is described, basing the formula on an abstract computing machine oper- 
ating on different bases. 

Based on the digits on the tape of the abstract machine it has been conjec- 
tured that the Collatz like problems are examples of presence of digit- wise chaos in 
arithmetic operations. 

To prove the digit-wise chaos, a very simple pseudorandom bit generator is pre- 
sented which is based upon Collatz sequence and which passes all of the diehard 
battery of randomness tests. 

The convergence criterion for Collatz like sequence then become more probable 
right shift operation than left shift operations of the abstract machine, and which 
becomes statistically decreasing number of digits, which reduces the problem into 
classical ruin problem on number of digits. 

Finally, based upon simple probability, an approximation of the stopping time 
ratio is established like in Q. 

3.2. The Implicit Base Assumption. Let us rewrite, 

3?i + 1 = 2n+ (n+ 1) 

And generalize it as Left-shift, followed by addition of an even number (n + 1 
is an even number). This deletes the last bit of the binary representation of the 
original number n. 

It evidently means that adding n + 1 is the trick to round off the last digit of the 
modified number n to in base 2. This insight is duly confirmed by the observation 
that, 

3n + 3 = 2n + (n + 2) 

also converges to {1,3} depending upon whether or not n is a power of 2. If it 
were not, it would converge to 3, else it would converge to 1. It has been known 
for sometime that these general 3n + k forms are nothing but transforms (scaling) 
of the original 3n + 1 sequence, where k is an odd number. 

Based upon these observations, it can be suggested that the Collatz Sequence 
has base 2 implicit within it. The breaking up of in + 1 = 2n+ (n + 1) is a clue to 
this insight. 

If this "theory of implicit base" holds, then on any base, there would exist a 
general formula of a sequence which would be based upon the left-shift and right- 
shift paradigm only. 

3.3. The Generalized Collatz Formula. We found that if the general formula 
has to exist, it has to be of the form :- 

= f » , n = mod b 

I (&+ l)n + (b — n mod b) , otherwise 

the (b + l)n + (b — n mod b) part of course is bn + (n+ (b — n mod b)). That is 
the basic algorithm that evolved from the concept of left and right shift operations 
in an abstract machine computing Generalized Collatz sequence in base b. 
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This sequence is in fact similar to the sequence mentioned in [3] as in ([2]). We 
note that:- 

b+l< b^,Vb> 2 
so, the convergence criterion is satisfied. 

The findings on the f(n) are summarized in the table [Tj. 

Table 1 . Results of the Tests on Generalized Collatz Sequence 



Base (b) 


Convergence Points 


Tested Up-to 


2 


1 


Maximum Integer 


3 


17? 


100000 


5 


1 


1354093827 


7 


1 


386581748 


11 


1,642,? 


100000 


13 


1 


24000000000 


17 


1,79,? 


100000 


19 


1 


340000000 


23 


1,82,? 


100000 


29 


1,111,? 


100000 


31 


1,389,? 


100000 


37 


1 


100000000 


41 


1 


99999999 


43 


1 


99999999 



3.4. A Simpler Collatz Sequence. A simpler and computationally faster for- 
mulation of Collatz Type sequence would be:- 

(5) /(n)-'*- 1 n^Omodb 



L^J, 



othe 



This can be computed in the fastest way in a digital computer, because no mod- 
ulus operation is required. Our original generic sequence formula (U) was nothing 

but:- 

r , n = mod b 



f(n) - r^j + ! f otherwise 
The simplified actual Collatz Sequence then can be written as 

n = mod 2 



(6) /(?l) ~ ( ffJ . otherwise 

The table [2J shows some of the results of the experiments on small bases. 
Table 2. Some Results on Simplified Generalized Collatz Sequence 



Base 


Convergence Points 


Tested Up-to 


2 


1,5,17,? 


100000 


3 


1,2,22,? 


100000 


5 


1,2,3,4,57,? 


100000 


7 


1,2,3,4,5,6,? 


100000 
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3.5. Pseudorandom Sequence Generator and The Generalized Collatz Se- 
quence. A Linear Congruential Generator (LCG) represents one of the oldest and 
best-known algorithm for the generation of pseudorandom numbers. The generator 
is defined by the recurrence relation:- 

X n+ \ = (aX n + c)mod m 

where:- 

m > is the modulus 

a, < a < 77i, is the multiplier 

c, < c < m, is the increment 

Xo, < Xo < m, is the start value 

n > is the iteration number 

Assume we are in base 2, and we have a tape holding k bits of information. Treat 
the tape as a register having k initialized bits. 
Assume also di = , Vi > k. 

Then we can define the recurrence relation on time, t (present), and t + 1, the 
next time interval as follows:- 

di(* + l) = d 1 {t) 

d 2 (t + l) = {d 1 {t + l)+d 2 {t) + l)mod2 

d 3 (t + 1) = {(kit + 1) + d 3 (t))mod 2 

dk-i(t + 1) = (dfc_ 2 (t + 1) + d k -i{t))mod 2 
dk = Carry digit 
Specifically 

fc-i 
d k -x{t + 1) = C^2 di{t))mod 2 

We can see it is nothing but a LCG, applied on di, (i — 1) times, to generate the 
di bit. Specifically :- 

fe-2 



4-i(i + 1) = (rffe-i(t) + J2 d i(t))mod 2 



This is an LCG where :- 
a = 1 and c = {J2i=i di(t)), where 'c' itself is generated using the same LCG, on 
lower bits. 

But this recurrence relation is the prescription of the operation 3n + 1 in binary 
digits on an abstract machine. Replacing mod 2 with mod b would be the prescrip- 
tion of calculating (b + l)n + (b — n mod b) in base b on an abstract machine, with 
set of LCGs:- 

fe-2 

(7) d k -i{t + 1) = (dk-i{t) + Y^ di(t))mod b 

4=1 

This observation is in line with Feinstein [T] that the standard mathematical way of 
proving this conjecture may not possible. Also note that this observation moves the 
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Collatz Problem - from number theory to Chaos Theory, and shows all generations 
of numbers generated by Collatz function would behave randomly. Similarities with 
cellular automata is apparent, and is discussed with a cellular automata in base 6 
in Wolfram J4] which simulates the original Collatz Function. 

It is also stated in Wolfram [4] that arithmetic operations can have digit-wise 
chaos, and we hypothesize that the convergence on generalized Collatz type se- 
quences might well be statistical, and the system behaves rather chaotically in 
higher number of digits. 

To test the hypothesis we did pass numbers with highly ordered binary repre- 
sentation, for example :- 

• n = 2 k — 1, this immediately reduces to 3 fc — 1, and elementary proof exists 
for this reduction. The system / from section 1 can be written as |(n+l) — 1 

when n is odd number, and hence f k = (|) (n+1) — 1 if all the k iteration 
produced odd number. The reduction to 3 fc — 1 immediately follows from 
there. 

• n = 2 k + 1 where it reduces to 2 k + 2 k ~ 1 +2 + 1. In binary form, it looks 
like 110000. ..00011. If the "11" in the left and right are separated enough, 
so that they are isolated in the next 3n + 1 operation, (i.e. carry does not 
reach the leftmost "11") then it would reduce to 100100. ..001 reducing the 
n by 2, and increasing the nj by 1. For example: 1000001 — > 1100011 — >■ 
100101000 -> 100101 

• n = 1111111000000000001 , where the l's and 0's arc in highly ordered 
state. 

3.6. A Random Experiment. Empowered by this newfound idea that, n + (6 — 
n mod b) is nothing but a random seed when added to bn to randomize the abstract 
machine's tape, we tried to experimentally verify the same. A program was written 
where we replaced the generalized Collatz function by 

f(n) = bn + R[n] 

where R[n] is a random integer divisible by b , and both n and R[n] (when 
represented in the base b) share the same number of digits. The assumption of 
having same number of digits was made to be consistent with the original Collatz 
problem where R[n] = n + 1. 

We tested this hypothesis with base-2, and the results were positive. After a 
long but finite run, numbers do converge to 1, with generalized randomized Col- 
latz sequence. We also note that adding a fixed even numbers in range R[n] 6 
{2, 4, 8, ..., 2 k ~ 1 } at every step, destroys the chaotic pattern, and makes it uniformly 
convergent. This was tested with the largest even number in the set, 2(2 fc ~ 1 — 1). 

3.7. A Binary Pseudorandom Digit Generator. To show that the binary dig- 
its generated by the Collatz sequence are bitwise random, we wrote a Pseudorandom 
digit generator based upon Collatz sequence. 

* Just run Collatz Sequence and store the generation of numbers in a 

* byte array, noting that we only store the result of 3/2 * ( n+1) -1 

* Also - we do not go to the end of Convergence 

* Because then it would be too predictable. 
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* What is apparent Collatz Sequence has 3 domains 

* 1, Randomization - where the input number would be shuffled and 

randomized 

* 2. Sustained Run - where the randomized tape would endure chaotic 

behaviour 

* 3. Convergence - where it would actually converge to 1. 

* In this generator - we want to operate only on [1] and [2] 
*************************************************************** 
static byte [ ] DoBinaryCollatzUntilSmall (Biglnteger nB) 

{ 

byte [] arr = new byte[0]; 

double perc = nB . toByteArray () .length; 

byte [ ] tmp =null; 

/* The 0.8 is the amount of reduction, bytewise. 

* If we let it reduce to 1 byte, then the end bytes would be 

too predictable as all numbers 

* converge to 1 . 

* The real random parts are on the higher number of bytes range 

* So, we choose a parameter and let the number of bytes reduced 

to 0.8 of the original. 
*/ 

int reduction = (int ) Math . ceil (perc*0 . 8 ) ; 
Biglnteger radixB = new Biglnteger ( "2 ") ; 
do 
{ 

while (nB .mod (radixB) . compareTo (Biglnteger . ZERO) -- 0) 
{ 

nB = nB . divide (radixB) ; 
/* 

* Why this is done? 

* Assume we have generated X000, then the system 

would 

* generate X00X0X, clearly pattern of X. 

* with the new code we will not ever generate X 

again . 

* if 3(X+1) is Y0, then we would generate 

* XY 
*/ 

} 

Biglnteger mod = nB .mod (radixB) ; 

mod = radixB . subtract (mod) ; 

mod - mod . add (nB) ; 

nB = nB .multiply (radixB) . add (mod) ; 

nB = nB . divide (radixB) ; 

tmp = nB . toByteArray () ; 
arr = add_to_array (tmp, arr) ; 
//System. out .print In (tmp .length +" : "+ reduction) ; 

}while( tmp . length > reduction ); 
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return arr; 



The findings are summarized in the table [5] . 

Table 3. Results of the Diehard Tests 



Test Name 


Resulting p- value [s] 


End Result 


Birthday Spacing 


0.913665 


PASSED 


Overlapping 5-Pcrmutation 


0.479927,0.631744 


PASSED 


Binary Rank Test (31X31) Matrix 


0.337549 


PASSED 


Binary Rank Test (32X32) Matrix 


0.568434 


PASSED 


Binary Rank Test (6X8) Matrix 


0.963464 


PASSED 


The Bit Stream Test 


min 0.05492, max 0.9487 


PASSED 


OPSO 


min 0.0492, max 0.9579 


PASSED 


OQSO 


min 0.0496, max 0.9801 


PASSED 


DNA 


min 0.0631, max 0.9877 


PASSED 


Count The l's On Byte Stream 


0.980913, 0.589621 


PASSED 


Count The l's On Specific Bytes 


min 0.015312,max 0.99139 


PASSED 


Parking Lot 


0.177727 


PASSED 


Minimum Distance Test 


0.833870 


PASSED 


3D Sphere Test 


0.954914 


PASSED 


Squeeze Test 


0.483180 


PASSED 


Overlapping Sums Test 


0.491630 


PASSED 


Runs Test Up 


0.256061, 0.528445 


PASSED 


Runs Test Down 


0.485818, 0.751651 


PASSED 


Craps Test 


wins:0.682508,throws/gamc:0.268146 


PASSED 



3.8. General Collatz Sequence As A Random Walk Problem. The General 
Collatz Sequence becomes a random walk problem when we see the results of the 
random bit generator. We define the problem as follows :- 

Define an one dimensional discrete space, where a point: P(x) G {1,2,3,4,...}. 
Assume 'fc' is the number of digits in the binary representation of a number, n. 
Clearly k G {1, 2, 3, 4, ...} and can be represented as a point in that discrete space. 
All the numbers {x : \x\ = k} (where |a;| denotes number of digits of "x" when 
expressed in binary) maps to the same point (fc) in that space. 

If we define the origin of the space as "1", then the distance "D(n)" of the 
number with digits''^" from origin is: D(n) = k — 1, \n\ = k. 

• If by any transform the number of digits increases for a number 'n' , the 
distance D increases. 

• If the number of digits remains same for the number 'n' , the distance D 
remains same. 

• If by any transform the number of digits decreases for a number 'n' , the 
distance D decreases. 

Adding any random even number from interval to 2(2 fe_1 — 1) would do cither 
of the 3 things as follows :- 
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Action 1: Effective Left-shift by adding 1 to the left side, and the number of digits 

moves to k + 1 
Action 2: No movement i.e. k remains as it is. 
Action 3: Right-Shift, as effective Os would be padded to the right, and the number 

of digits moves to less than k. 

Point to be noted here is that for left-shift, we have step-size as only 1. But for 
right-shift, the available step size is in the set {1,2, 3,.., k — 1}. So, the random 
walk is so defined as D(t + 1) = D(t) + X where 

Xe {1,0,-1,-2, ...,-(fc-i)} 

"X" being a random variable. 

The logarithm of a number is nothing but the amount of digits needed to rep- 
resent it. In that case stopping time is nothing but the time taken to reach some 
lower number of digits, and can be taken as 

fa\ ( \ ~ l °9b{n) 

(8) *(») « -^ 

where E(X) is statistical expectation of random variable "A" . 

This is the reason why runtime of Collatz like sequences are known to have 
stopping time that is proportional to the logarithm of the number as mentioned in 
Lagarias [2] and comes from the 1st principle, rather than any heuristic presented 
in Crandall [5] and Shanks j6], and provides a theoretical basis for further study. 

We can also expect then, that if we do random sampling on numbers with a 
very high number of digits (typically 500+ on smaller bases, and 100+ on larger 
bases) then take ratio of the average of the empirical stopping time with the number 
of digits, the result would be approximately a fixed ratio, no matter how large we 
make the number of digits. The table [4] shows the expectation values remains fixed, 
as number of digit goes large. In other words, we can empirically find expectation 
E{X) by this formula 

E{X) ~ l ° 9b{n) 



a(n) 

and this would have a limit when digits go large. This behavior is actually seen 
when we try to analyze general Collatz behavior. That experiment is the topic of 
the next section, as the limiting fixed ratio is what is actually observed. 

We see that the convergence can be thought about as "Classical Ruin" problem, 
where you started with some fixed amount of digits, and start either winning 1, or 
loosing or staying same with probabilities pi and p r and p n . 

If the expectation is negative then it would ensure that after a random run, the 
system always would end up with 1 or more digit less than how it started. Hence 
starting from k digits, one would always end in k — i,i 6 {1,2,3, ...} eventually. 

4. Generalized Collatz Sequence At Large Number of Random Digits 
We define the Probability lim P(b, x, s) as the probability of moving to s steps 

X— ►oc 

when number of digits "x" is approaching infinity, where s 6 {1,0,-1,-2,...}. 
For brevity we can write lim P(b, x, s) = P(b, s) by dropping the "x" altogether. 
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Collatz sequence does not sample the Most Significant digits uniformly, because 
whenever it left-shifts, out of the possibility of 

{1,2,3,...,6-1}x{0,1,2,3,...,6-1} 

it can expand only to (10)b as the two most significant bits. 

We did experiments to find the probability of left-shift and right-shift of the 
Collatz system. They were done using random digit stream to be feed to the 
Generalized Collatz sequence as integer input. 

The convergence points are never more than two, even on 1000 or more digit 
long integers. For example, base 3 has {1,7} convergence points, and even at 1000 
digit long integer, the system still converge at only these two points. 

Upon fixing the number of digits, 50 sample numbers were created by choosing 
each digit randomly. On these 50 numbers the left, neutral (remain on the same 
digit) and right shift probabilities were observed and averaged. 

The rationale of choosing each digit random is to ensure that the Collatz system 
does not have to do the randomization. 

The table [4] shows the system dynamics on different bases, and in different 
number of digits. (They might add up to more than 1.00 as we have chosen for all 
the values - most occurring ones). 



Table 4. P(b,s) 
/Right Movement 



Empirical Data - Showing Probability Left- 



Base 


Digits 


P(b,l) 


P(b,0) 


P(b,-Any) 


Expectation 


2 


1000 - 5000 


0.39090 


0.27657 


0.33252 


-0.27149 


3 


500 - 1000 


0.19704 


0.55520 


0.24775 


-0.17518 


5 


500 - 1000 


0.09451 


0.73837 


0.16678 


-0.11436 



One key thing to be noted here is that we modified the original sequence. When 
a number of form n = pb s ;p mod 6^0 comes, instead of treating reduction of this 
number to "s" steps into "p" , we treated it as a single step reduction, an "s" step 
right shift. 

Hence, there exist a P(b,-s), that is probability of occurrence of a number of the 
form n = pb B . 

Another table [5] shows the P(b,-s) in different bases. 

Table 5. P(b,-s) Empirical Data - Showing Probability Of Var- 
ious Right Shift Movements 



Base 


P(b,-1) 


P(b,-2) 


P(b,-3) 


2 


0.16699 


0.08358 


0.04117 


3 


0.16500 


0.05526 


0.01851 


5 


0.13310 


0.02706 


0.00527 



a general class of collatz sequence and ruin problem 11 

5. Collatz Probability Formulae For Base 2 - The Original Collatz 

Sequence 

In the earlier section, we have presented the probabilities as coming from the 
data itself. We show how they are to be expected from the 1st principle. 

Theorem 5.1. Total Right Shift Probability. 

Total Right Shift Probability is : 

1 



6+1 
Proof of Total Right Shift Probability Theorem. The Collatz system is 3 state sys- 
tem. It can either do a left shift (increasing digits), remain neutral (no change in 
the number of digits), or do a right shift (decrease in 1 or more number of digits). 
This is a discrete step process. Assume that at the A:'th step the probability of 
right shift is p r (b, k). So the probability that the system did not do a right shift at 
k'th step is given by:- 

1 -p r (b,k) 

Now, assuming Collatz Transform generates the lower significant digits in Uni- 
form distribution, the probability of having one or more trailing zero would be digit 
P (d = 0) = i, 

which means, probability of having trailing zeros at k + l'th step is:- 

p r (b,k+l) = {l-p r (b,k)) x - 

b 

Now, at equilibrium, we would expect that 

p r (b,k) Kip r (b,k+ 1) ~p r {b) 
Solving for p r (b) immediately gives:- 

(9) Pr{b) = ^ 

This establishes the theorem. □ 

Theorem 5.2. Right Shift Probabilities. Right Shift Probabilities : P(b,-s) are 
geometrically distributed. 

Proof of Right Shift Probabilities Theorem. We show here that the P(b,-s) has the 
formula:- 

(10) P {b ,-s) = ^.b- 

We start with noting that P(b,-s) is actually the probability of having "s" num- 
bers of "0" in the end of a number, feed into the right shift operation of the abstract 
machine. Here we note that 

P(b,-k-l) = ^P(b,-k) 

as, to generate a "k+1" times "0" padded number from "k" times "0" padded 
number is to add an additional "0" to the right, which only 1 out of {0, 1, 2, 3, .., b — 
2, b — 1} or b possibilities. 
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We now deduce 

Assume a number X — d n ~id n -2---dido. It would undergo a single right shift iff :- 
The system is doing a right shift, with d\ is nonzero. Formally:- 

P(b, -I) = P(di + 1 AND 'System Doing Right Shift') 

as they are independent :- 

P(b,-l) = P(d 1 ^0)x Pr (b) 

However, we clearly know that:- 

P(di ± 0) = b -^- 

Hence, P(b,-1), that is only a single right shift takes place becomes:- 

s b-1 1 6-l,i 

P(b, -1 = — — x = b- 1 

v ' ' b 6+16+1 

This establishes the theorem, when we note down that P(b, —k — 1) = \P{b, —k). 
This is also clear from the table [5] □ 

We now note a population of original Collatz sequences, defined to be 

{...,f(x),...J n (x)} 

We note that in any f l (x), the most significant 2 digits can be either "10" or "11". 
But as the system always generates every left shift "10" , hence we can be sure that 
in the population, "10" would be higher than that of "11" . Assume also that we are 
concerned with only the digit expansion operation, as digit contraction operation 
(right shift) does not change the MSBs. We now derive the proportion of time the 
system would be in "11" and "10" states. 

We note that the system prefixing "11" would always become "10" in the next 
iteration. While, "10" would become "11" if an only if there is a carry from the 
lower digits. 

We now deduce the probability of the carry. 

Theorem 5.3. Probability of carry to the MSB is a constant for Collatz 
Sequence. 

Probability of carry to the MSB is : 

1 

3' 
Proof of Carry Theorem. If the number can be written as dk-idk-2---do we can 
effectively write it as 

dk-idk-2---do = dk-idk-2X 

where X is itself a number which is NOT divisible by 2. Hence, X is a max k — 2 
digit odd number, there is no restriction of initial 0's on X. 

Now we note that for a combination of (dfc-i, dk-2) = (1, 0), the resultant num- 
ber looks like 

10A0 + 10A + 1 

That is to say, to influence the dfc-i = 1, the 3 A + 1 number had to have \X\ + 2 
digits. 
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If we consider odd numbers X, starting from 1 to (2 fc — 1), and check how many 
times applying generalized Collatz transform increased the number of digits by 2, 
we would reach the number of cases a carry is generated. 

Assume the number is d\ z -\dk^2^k-2- If (dfc-ij^fc-2) = (1, 1) , there surely 
would be a carry. If (dk—i, dk~2) = (1, 0), there would be a carry iff X^-2 has one 
carry. Hence comes the recurrence relation:- 

(11) p c(fc) = I + Ip c(fc -2) 

It is not hard to see that the above recurrence relation, at large k, would yield:- 

Pc{k) ~Pc{k -2) 
And from there:- 

1 
F{(Jarry) = p c ~ 

And this completes our proof. 

□ 



(12) P(Carry)=p c . 



Theorem 5.4. The probability of '10' and '11' states are constants. 

The probability of occurrence of '10' and '11 ' states are 0.60 and 0.^0 respectively. 

Proof of State Density Theorem. Let's define the symbol A x N y as earlier system 
was in 'x' and now in 'y', Clearly then, we have the following behavior:- 

p ( ioiii)= AiiJVio( ;+ l) = i 

V ' ' N n (t) 

And, as "10X0" when added with "10X" would always generate "11" system, iff 
there is no carry:- 

P(U|10)= N 10 (t) =1 " Pe 
In equilibrium state:- 

(13) A u JVio(t)=AioJVii(t) 
Immediately the below equations follow:- 

(14) N n(t) 1-Pc ^2 



(15) 



N w (t)+Nu{t) 1 + {1- Pc ) 5 
N w (t) 1 3 



^io(*)+JVn(*) 1+(1-Pc) 5 
And this completes our proof. 



D 



We end the discussion by stating that the empirical data shows the ratio of 
population of Nu and Niq as 0.40667 and 0.5930 approximately. 

Now we derive the left shift probability of the base 2 Collatz system - that is 
probability that the number of digits would be increased by 1. 
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Theorem 5.5. Left Shift Probability of Collatz Sequence. 

Left Shift Probability of Collatz Sequence is 0.4 approximately. 

Proof of Left Shift Probability Theorem. The system left-shifts only when the sys- 
tem has no 'O's in the right. So, assume the probability of right shift is :- 

P(Right Shift) = p r 

then, the probability of left-shift of the system is 

P(Left Shift) = Pl = (1 - Pr )(P L (10)P(10) + P L (11)P(11)) 

where P(10) ~ 0.6 and P(H) ~ 0.4 are probabilities that system would be in "10" 
and "11" MSB state. But then, Pl(^) = 1, that is system always would increase 
the number of digits, if it is odd. 
Right shift probability from ([9]) 

(16) pr(2) = Pr = ^ « 0.33 

That would mean that 

(17) Pi « (1 - Pr)(0.4 + 0.6 Pc ) » 0.6(1 - Pr ) « 0.40 

While empirically found value was 0.39. This establishes the theorem. □ 

Now we find the expectation value for base 2. 

Theorem 5.6. Expectation of Collatz Random Walk is constant. 

Expectation of Collatz Random Walk is —0.270 approximately. 



Proof of Expectation Value. We note down it as 



(6-1) 



E{X) = lx K+ 0xp n+ ^ Tft + ir 6 * 

For b=2, that rightmost term becomes:- 

f (2^1) s2S __2 
^(2 + 1) ~ 3 

(18) E(X) « 0.40 - 0.667 « -0.267 

This proves the theorem. □ 

Theorem 5.7. Expected Stopping Time of Collatz Sequence would be con- 
stant times the number of bits. 

Expected runtime (Stopping Time) of Collatz Sequence would be approximately 
3. 7 times the number of bits. 

Proof of Expected Stopping Time. The expected time to converge for random digits 
would be 

(19) a2{n )J^^ Ki . mog2{n) 

□ 
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This is the linearity with respect to log{n) we see in the large numbers (n). 
Empirically observed value is 02 (n) « 3. 60/052 ("■)■ 

6. Summary And Further Work 

We can surely say that the Collatz like sequences are built on top of chaos 
generated by the implicit pseudo random generator, bn+n + (b — n mod b) according 
to (|7|). We also noted that adding fixed even numbers in range (even at random) 
{2, 4, ..., 2(2 fc_1 — 1)} at every step, destroys the chaotic pattern. 

We have shown empirically that the pure random sequence terminates, and we 
proved that the statistical convergence exists by enumerating the possibility of 
the left and right shift and that the resulting expectation is always negative. We 
also found numerous theorems from the basic probability principles, which shows 
remarkable accuracy with experimental data. 

Research is much needed on this specific type of pseudorandom sequence gener- 
ator, and predictability of the bases, where the system would converge to "1" , and 
how many attractor/convergence points the system should have. 

We end with a quote from the great Von Neumann: "Anyone who considers arith- 
metical methods of producing random digits is, of course, in a state of sin. ". We 
humbly beg to differ, by stating that The Generalized Collatz Sequence can be used 
as a fantastic random symbol generator, passing all but one diehard tests (fails 
the minimum distance test) and almost all NIST tests. As Simplified Generalized 
Collatz Sequence ((5]) does not need any arithmetic operation other than addition, 
it is a good candidate for pseudorandom sequence generation in computers. 
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